Term-weighting approaches in automatic text retrieval
نویسندگان
چکیده
منابع مشابه
Term-Weighting Approaches in Automatic Text Retrieval
The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained...
متن کاملSupporting Text Retrieval by Typographical Term Weighting
Text documents stored in information systems usually consist of more information than the pure concatenation of words, i.e., they also contain typographic information. Because conventional text retrieval methods evaluate only the word frequency, they miss the information provided by typography, e.g., regarding the importance of certain terms. In order to overcome this weakness, we present an ap...
متن کاملAn Investigation of Term Weighting Approaches for Microblog Retrieval
The use of effective term frequency weighting and document length normalisation strategies have been shown over a number of decades to have a significant positive effect for document retrieval. When dealing with much shorter documents, such as those obtained from microblogs, it would seem intuitive that these would have less benefit. In this paper we investigate their effect on microblog retrie...
متن کاملRALI: Automatic Weighting of Text Window Distances
Systems using text windows to model word contexts have mostly been using fixed-sized windows and uniform weights. The window size is often selected by trial and error to maximize task results. We propose a non-supervised method for selecting weights for each window distance, effectively removing the need to limit window sizes, by maximizing the mutual generation of two sets of samples of the sa...
متن کاملImbalanced text classification: A term weighting approach
The natural distribution of textual data used in text classification is often imbalanced. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We tackle this problem using a simple probability based term weighting scheme to better distinguish documents in minor categories. This new scheme directly utilizes two critical information rati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing & Management
سال: 1988
ISSN: 0306-4573
DOI: 10.1016/0306-4573(88)90021-0